Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improvements to Container CD #385

Closed
wants to merge 20 commits into from
Closed

Improvements to Container CD #385

wants to merge 20 commits into from

Conversation

tylertitsworth
Copy link
Contributor

@tylertitsworth tylertitsworth commented Jul 9, 2024

Description

Update CD to follow DevOps best practices.

How it works

Given an example(s) (AudioQnA), this workflow creates jobs that perform container CI on that example, and allow other event-based workflows to curate a matrix of these examples.

These are the inputs for the composites:

Build

# .github/workflows/composite/docker-build/action.yml
inputs:
  example_dir:
    description: Directory with docker-compose.yaml to build
    required: true
    type: string
  env_overrides:
    description: Bash Env Variable Overrides in `KEY=VAL && KEY2=VAL2` format
    required: false
    type: string
  registry:
    description: Container Registry URL
    required: false
    default: 'opea-project'
    type: string

Scan

# .github/workflows/composite/scan/action.yml
inputs:
  image-ref:
    description: 'image reference(for backward compatibility)'
    required: true
  output:
    description: 'writes results to a file with the specified file name'
    required: true

This is the inputs for the reusable workflow:

Container CI

# .github/workflows/container.ci.yaml
inputs:
  example_dir:
    required: true
    type: string
  scan:
    default: true
    required: false
    type: boolean
  test:
    default: true
    required: false
    type: boolean
  publish:
    default: false
    required: false
    type: boolean

These are the Events that this PR supports:

# .github/workflows/example-weekly-ci.yaml
on:
  # weekly
  schedule:
  - cron: "0 0 * * 0"
  # run all examples manually
  workflow_dispatch: null
  push:
    tags: ['**'] # whenever a tag is created/modified
# ./github/workflows/example-manual-ci.yaml
on:
  workflow_dispatch:
    inputs:
      examples:
        default: 'example1, example2'
        description: 'List of comma-separated examples to test'
        required: true
        type: string
      scan:
        default: true
        description: 'Scan all examples with Trivy'
        required: false
        type: boolean
      test:
        default: true
        description: 'Test Examples'
        required: false
        type: boolean
      publish:
        default: false
        description: 'Publish Images to Dockerhub'
        required: false
        type: boolean

Changes

  • Add StepSecurity network traffic monitoring during workflows via harden-runner
  • Tests labels are determined based on the test workflow filename:
# .github/workflows/container-ci.yaml
- name: Get Tests
  id: test-matrix
  run: |
    echo "matrix=$(find ${{ inputs.example_dir }}/tests -type f -name 'test_*.sh' -print | \
    sed -n 's|.*/\(test_[^_]*_on_\([^\.]*\)\).sh|\{"test": "\1", "hardware_label": "\2"}|p' | \
    jq -sc .)" >> $GITHUB_OUTPUT
  # ex: [{"test":"test_audioqna_on_xeon","hardware_label":"xeon"},{"test":"test_audioqna_on_gaudi","hardware_label":"gaudi"}]
...
runs-on: ${{ matrix.tests.hardware_label }}
...
- name: Run Test
  run: bash ${{ matrix.tests.test }}.sh
  env:
    REGISTRY: ${{ secrets.REGISTRY }}
  • Simplify developer workflow:
# Before
docker build -t container1
docker build -t container2
docker build -t container3
...
# After
docker compose build
  • Test only the latest version of that example's container
    • We pull the last known good version of the comps, and test the changes made in the example only.
    • Modifying the test file or docker-compose.yaml file for a given test can allow for testing specific versions of comp containers. No need to modify CI.
  • status-check flag in pull_request events that verifies if any job in example-ci.yaml failed.

Issues

n/a

Type of change

List the type of change like below. Please delete options that are not relevant.

  • Bug fix (non-breaking change which fixes an issue)
  • New feature (non-breaking change which adds new functionality)
  • Breaking change (fix or feature that would break existing design and interface)

Dependencies

n/a

Tests

I don't have the runners required to do the validation portion, however when previously testing with ubuntu-latest I was able to launch the test script from the correct directory on that runner before it ran out of space.
https://github.com/tylertitsworth/GenAIExamples/actions/runs/9896085740?pr=3

@tylertitsworth tylertitsworth self-assigned this Jul 9, 2024
Copy link

@claynerobison claynerobison left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

All I see is the minor doc change. other than that, seems like an elegant architecture improvement.

@chensuyue
Copy link
Collaborator

chensuyue commented Jul 10, 2024

How does this new structure combined with current compose and k8s functionality test? Our test machine are not able to access an internal docker registry, so we have 2 local registry for CI test. We need to sync about the design.

Copy link
Collaborator

@ashahba ashahba left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @tylertitsworth for the PR.
My change requests are mostly minor except for introducing dependency on ai-containers/.github@main which needs to be reviewed by the community.

.github/workflows/container-ci.yml Outdated Show resolved Hide resolved
# Get diff array filtered by specific filetypes
DIFF=$(git diff --diff-filter=d \
--name-only ${{ github.event.pull_request.base.sha }}...${{ github.event.pull_request.head.sha }} \
-- '*/*Dockerfile' '*.py' '*.yaml' '*.yml' '*.sh' '*/*requirements.txt' '*.json' '*.ts' '*.js' | \
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

there may be a few more file types and extensions:
*.dockerfile, *Dockerfile*, *.cjs, *.svelte and etc

.github/workflows/reuse-container-ci.yaml Outdated Show resolved Hide resolved
.github/workflows/reuse-container-ci.yaml Outdated Show resolved Hide resolved
.github/workflows/reuse-container-ci.yaml Outdated Show resolved Hide resolved
SearchQnA/docker/xeon/README.md Outdated Show resolved Hide resolved
SearchQnA/docker-compose.yaml Outdated Show resolved Hide resolved
SearchQnA/docker-compose.yaml Outdated Show resolved Hide resolved
Translation/docker/gaudi/README.md Outdated Show resolved Hide resolved
Translation/docker/xeon/README.md Outdated Show resolved Hide resolved
@tylertitsworth
Copy link
Contributor Author

tylertitsworth commented Jul 10, 2024

How does this new structure combined with current compose and k8s functionality test? Our test machine are not able to access an internal docker registry, so we have 2 local registry for CI test. We need to sync about the design.

@chensuyue the current compose structure could be merged with this design. It's all about having a consistent file system and making sure your dependencies are in subfolders so you can accurately track changes to individual containers.

Regarding the local CI, we can change the runner labels and just make sure to use the correct secrets.

Regarding k8s, I believe we previously spoke about how to best test helm charts. We have a composite action for this. However I see a lot of manifests so it might be better to utilize the baremetal test runner mode to call your existing test scripts to validate those manifests. If I were smarter I would've made a branch rather than a fork so I could actually test these things, it's hard to integrate GitHub CI without maintainer access.

@tylertitsworth
Copy link
Contributor Author

My change requests are mostly minor except for introducing dependency on ai-containers/.github@main which needs to be reviewed by the community.

Thanks @ashahba, it's common practice to re-use GitHub actions from other locations. I hope that the community will accept another intel repository as a trusted and maintained source similar to how they trust GitHub, docker, helm, pre-commit.ci, and other open source tools already in use.

If the fear of a dependency failing is important. These actions can be pinned at a specific SHA to avoid failures like I did with many other actions introduced by this workflow.

tylertitsworth added 8 commits July 10, 2024 12:21
Signed-off-by: tylertitsworth <[email protected]>

add chatqna support

Signed-off-by: tylertitsworth <[email protected]>

update tests and repo path

Signed-off-by: tylertitsworth <[email protected]>

add actions.json file for ci configuration

Signed-off-by: tylertitsworth <[email protected]>

add codegen

Signed-off-by: tylertitsworth <[email protected]>

add weekly test

Signed-off-by: tylertitsworth <[email protected]>

remove ref

Signed-off-by: tylertitsworth <[email protected]>

change runner label

Signed-off-by: tylertitsworth <[email protected]>

do the rest of them

Signed-off-by: tylertitsworth <[email protected]>

copy workflow call into repo due to gha bug #2877542

Signed-off-by: tylertitsworth <[email protected]>

configure no start

Signed-off-by: tylertitsworth <[email protected]>

patch scan and test wfs

Signed-off-by: tylertitsworth <[email protected]>

change to more descriptive name for reusable workflow

Signed-off-by: tylertitsworth <[email protected]>

additional bugfix

Signed-off-by: tylertitsworth <[email protected]>

update scan workflow

Signed-off-by: tylertitsworth <[email protected]>

see help menus

Signed-off-by: tylertitsworth <[email protected]>

add missing env

Signed-off-by: tylertitsworth <[email protected]>

use downcase method

Signed-off-by: tylertitsworth <[email protected]>

remove test

Signed-off-by: tylertitsworth <[email protected]>

use valid group dir for rest of scan workflow

Signed-off-by: tylertitsworth <[email protected]>

return import step

Signed-off-by: tylertitsworth <[email protected]>

add and validate smoke tests

Signed-off-by: tylertitsworth <[email protected]>
Signed-off-by: tylertitsworth <[email protected]>
Signed-off-by: tylertitsworth <[email protected]>
Signed-off-by: tylertitsworth <[email protected]>
Signed-off-by: tylertitsworth <[email protected]>
Signed-off-by: tylertitsworth <[email protected]>
Signed-off-by: tylertitsworth <[email protected]>
@tylertitsworth tylertitsworth marked this pull request as ready for review July 10, 2024 22:04
@tylertitsworth tylertitsworth added feature New feature or request and removed WIP labels Jul 10, 2024
- uses: actions/checkout@692973e3d937129bcbf40652eb9f2f61becf3332 # v4.1.7
- uses: docker/login-action@0d4c9c5ea7693da7b068278f7b52bda2a190a446 # v3.2.0
with:
registry: ${{ secrets.REGISTRY }}
Copy link
Collaborator

@chensuyue chensuyue Jul 11, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How to use a common private registry on IDC machine? Now we have 2 local registry separately for xeon and gaudi cluster.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If you can use something like harbor that can be accessed by both clusters or a cloud registry that would be ideal.

If your runners have pre-authenticated access to these local registries, then we could remove the login commands/secrets and use environment variables that are present in the runner environment.

tylertitsworth added 8 commits July 11, 2024 09:26
Signed-off-by: tylertitsworth <[email protected]>
Signed-off-by: tylertitsworth <[email protected]>
Signed-off-by: tylertitsworth <[email protected]>
Signed-off-by: tylertitsworth <[email protected]>
Signed-off-by: tylertitsworth <[email protected]>
Signed-off-by: tylertitsworth <[email protected]>
Signed-off-by: Tyler Titsworth <[email protected]>
Signed-off-by: tylertitsworth <[email protected]>
@tylertitsworth tylertitsworth changed the title Improvements to Container CI Improvements to Container CD Jul 12, 2024
Signed-off-by: tylertitsworth <[email protected]>
Signed-off-by: tylertitsworth <[email protected]>
tylertitsworth added 2 commits July 12, 2024 11:15
Signed-off-by: tylertitsworth <[email protected]>
Signed-off-by: tylertitsworth <[email protected]>
wangkl2 pushed a commit to wangkl2/GenAIExamples that referenced this pull request Dec 11, 2024
* refactor build image build and push workflow

Signed-off-by: chensuyue <[email protected]>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
feature New feature or request
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants